Origin The present dataframe has been created for marketing analysis purposes. It assembles various personal information about 2239 customers, such as their education level, income, age, marital status, number of children at home…
It also shows their consuming habits (amount spent on wine, on sweets…) and the number of purchases made on discounted products.
There is very few context concerning this dataframe, since the source is unknown. It is not clear when these informations were registered, but probably by 2014 since the date of customers’ enrollment within the company doesn’t go further than 2014.
Aims To predict the customer’s behavior (Number of purchases made with a discount) depending on the most significant personal attributes To categorize participants in a few typical profiles (probably with PCA)
Attributes
ID: Customer’s unique identifier Year_Birth: Customer’s birth year Education: Customer’s education level Marital_Status: Customer’s marital status Income: Customer’s yearly household income Kidhome: Number of children in customer’s household Teenhome: Number of teenagers in customer’s household Dt_Customer: Date of customer’s enrollment with the company Recency: Number of days since customer’s last purchase Complain: 1 if customer complained in the last 2 years, 0 otherwise
MntWines: Amount spent on wine in last 2 years MntFruits: Amount spent on fruits in last 2 years MntMeatProducts: Amount spent on meat in last 2 years MntFishProducts: Amount spent on fish in last 2 years MntSweetProducts: Amount spent on sweets in last 2 years MntGoldProds: Amount spent on gold in last 2 years
NumDealsPurchases: Number of purchases made with a discount AcceptedCmp1: 1 if customer accepted the offer in the 1st campaign, 0 otherwise AcceptedCmp2: 1 if customer accepted the offer in the 2nd campaign, 0 otherwise AcceptedCmp3: 1 if customer accepted the offer in the 3rd campaign, 0 otherwise AcceptedCmp4: 1 if customer accepted the offer in the 4th campaign, 0 otherwise AcceptedCmp5: 1 if customer accepted the offer in the 5th campaign, 0 otherwise Response: 1 if customer accepted the offer in the last campaign, 0 otherwise NumStorePurchases: Number of purchases made directly in stores
Before loading the dataset, we want to make sure we have all the necessary packages installed and loaded, and that the code can be run by anybody.
if(!require(pacman)) {
install.packages("pacman")
library(pacman)
}
suppressPackageStartupMessages(pacman::p_load(tidyverse, gtsummary, ggpubr, moments, here, sjPlot, parameters, effectsize, pander))
path = here("JULIETTE")
setwd(path)
data <- read.table("marketing_campaign.csv", header=T, sep="\t")
str(data)
## 'data.frame': 2240 obs. of 29 variables:
## $ ID : int 5524 2174 4141 6182 5324 7446 965 6177 4855 5899 ...
## $ Year_Birth : int 1957 1954 1965 1984 1981 1967 1971 1985 1974 1950 ...
## $ Education : chr "Graduation" "Graduation" "Graduation" "Graduation" ...
## $ Marital_Status : chr "Single" "Single" "Together" "Together" ...
## $ Income : int 58138 46344 71613 26646 58293 62513 55635 33454 30351 5648 ...
## $ Kidhome : int 0 1 0 1 1 0 0 1 1 1 ...
## $ Teenhome : int 0 1 0 0 0 1 1 0 0 1 ...
## $ Dt_Customer : chr "04-09-2012" "08-03-2014" "21-08-2013" "10-02-2014" ...
## $ Recency : int 58 38 26 26 94 16 34 32 19 68 ...
## $ MntWines : int 635 11 426 11 173 520 235 76 14 28 ...
## $ MntFruits : int 88 1 49 4 43 42 65 10 0 0 ...
## $ MntMeatProducts : int 546 6 127 20 118 98 164 56 24 6 ...
## $ MntFishProducts : int 172 2 111 10 46 0 50 3 3 1 ...
## $ MntSweetProducts : int 88 1 21 3 27 42 49 1 3 1 ...
## $ MntGoldProds : int 88 6 42 5 15 14 27 23 2 13 ...
## $ NumDealsPurchases : int 3 2 1 2 5 2 4 2 1 1 ...
## $ NumWebPurchases : int 8 1 8 2 5 6 7 4 3 1 ...
## $ NumCatalogPurchases: int 10 1 2 0 3 4 3 0 0 0 ...
## $ NumStorePurchases : int 4 2 10 4 6 10 7 4 2 0 ...
## $ NumWebVisitsMonth : int 7 5 4 6 5 6 6 8 9 20 ...
## $ AcceptedCmp3 : int 0 0 0 0 0 0 0 0 0 1 ...
## $ AcceptedCmp4 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AcceptedCmp5 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AcceptedCmp1 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AcceptedCmp2 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Complain : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Z_CostContact : int 3 3 3 3 3 3 3 3 3 3 ...
## $ Z_Revenue : int 11 11 11 11 11 11 11 11 11 11 ...
## $ Response : int 1 0 0 0 0 0 0 0 1 0 ...
pander(summary(data))
| ID | Year_Birth | Education | Marital_Status |
|---|---|---|---|
| Min. : 0 | Min. :1893 | Length:2240 | Length:2240 |
| 1st Qu.: 2828 | 1st Qu.:1959 | Class :character | Class :character |
| Median : 5458 | Median :1970 | Mode :character | Mode :character |
| Mean : 5592 | Mean :1969 | NA | NA |
| 3rd Qu.: 8428 | 3rd Qu.:1977 | NA | NA |
| Max. :11191 | Max. :1996 | NA | NA |
| NA | NA | NA | NA |
| Income | Kidhome | Teenhome | Dt_Customer |
|---|---|---|---|
| Min. : 1730 | Min. :0.0000 | Min. :0.0000 | Length:2240 |
| 1st Qu.: 35303 | 1st Qu.:0.0000 | 1st Qu.:0.0000 | Class :character |
| Median : 51382 | Median :0.0000 | Median :0.0000 | Mode :character |
| Mean : 52247 | Mean :0.4442 | Mean :0.5062 | NA |
| 3rd Qu.: 68522 | 3rd Qu.:1.0000 | 3rd Qu.:1.0000 | NA |
| Max. :666666 | Max. :2.0000 | Max. :2.0000 | NA |
| NA’s :24 | NA | NA | NA |
| Recency | MntWines | MntFruits | MntMeatProducts |
|---|---|---|---|
| Min. : 0.00 | Min. : 0.00 | Min. : 0.0 | Min. : 0.0 |
| 1st Qu.:24.00 | 1st Qu.: 23.75 | 1st Qu.: 1.0 | 1st Qu.: 16.0 |
| Median :49.00 | Median : 173.50 | Median : 8.0 | Median : 67.0 |
| Mean :49.11 | Mean : 303.94 | Mean : 26.3 | Mean : 166.9 |
| 3rd Qu.:74.00 | 3rd Qu.: 504.25 | 3rd Qu.: 33.0 | 3rd Qu.: 232.0 |
| Max. :99.00 | Max. :1493.00 | Max. :199.0 | Max. :1725.0 |
| NA | NA | NA | NA |
| MntFishProducts | MntSweetProducts | MntGoldProds | NumDealsPurchases |
|---|---|---|---|
| Min. : 0.00 | Min. : 0.00 | Min. : 0.00 | Min. : 0.000 |
| 1st Qu.: 3.00 | 1st Qu.: 1.00 | 1st Qu.: 9.00 | 1st Qu.: 1.000 |
| Median : 12.00 | Median : 8.00 | Median : 24.00 | Median : 2.000 |
| Mean : 37.53 | Mean : 27.06 | Mean : 44.02 | Mean : 2.325 |
| 3rd Qu.: 50.00 | 3rd Qu.: 33.00 | 3rd Qu.: 56.00 | 3rd Qu.: 3.000 |
| Max. :259.00 | Max. :263.00 | Max. :362.00 | Max. :15.000 |
| NA | NA | NA | NA |
| NumWebPurchases | NumCatalogPurchases | NumStorePurchases | NumWebVisitsMonth |
|---|---|---|---|
| Min. : 0.000 | Min. : 0.000 | Min. : 0.00 | Min. : 0.000 |
| 1st Qu.: 2.000 | 1st Qu.: 0.000 | 1st Qu.: 3.00 | 1st Qu.: 3.000 |
| Median : 4.000 | Median : 2.000 | Median : 5.00 | Median : 6.000 |
| Mean : 4.085 | Mean : 2.662 | Mean : 5.79 | Mean : 5.317 |
| 3rd Qu.: 6.000 | 3rd Qu.: 4.000 | 3rd Qu.: 8.00 | 3rd Qu.: 7.000 |
| Max. :27.000 | Max. :28.000 | Max. :13.00 | Max. :20.000 |
| NA | NA | NA | NA |
| AcceptedCmp3 | AcceptedCmp4 | AcceptedCmp5 | AcceptedCmp1 |
|---|---|---|---|
| Min. :0.00000 | Min. :0.00000 | Min. :0.00000 | Min. :0.00000 |
| 1st Qu.:0.00000 | 1st Qu.:0.00000 | 1st Qu.:0.00000 | 1st Qu.:0.00000 |
| Median :0.00000 | Median :0.00000 | Median :0.00000 | Median :0.00000 |
| Mean :0.07277 | Mean :0.07455 | Mean :0.07277 | Mean :0.06429 |
| 3rd Qu.:0.00000 | 3rd Qu.:0.00000 | 3rd Qu.:0.00000 | 3rd Qu.:0.00000 |
| Max. :1.00000 | Max. :1.00000 | Max. :1.00000 | Max. :1.00000 |
| NA | NA | NA | NA |
| AcceptedCmp2 | Complain | Z_CostContact | Z_Revenue | Response |
|---|---|---|---|---|
| Min. :0.00000 | Min. :0.000000 | Min. :3 | Min. :11 | Min. :0.0000 |
| 1st Qu.:0.00000 | 1st Qu.:0.000000 | 1st Qu.:3 | 1st Qu.:11 | 1st Qu.:0.0000 |
| Median :0.00000 | Median :0.000000 | Median :3 | Median :11 | Median :0.0000 |
| Mean :0.01339 | Mean :0.009375 | Mean :3 | Mean :11 | Mean :0.1491 |
| 3rd Qu.:0.00000 | 3rd Qu.:0.000000 | 3rd Qu.:3 | 3rd Qu.:11 | 3rd Qu.:0.0000 |
| Max. :1.00000 | Max. :1.000000 | Max. :3 | Max. :11 | Max. :1.0000 |
| NA | NA | NA | NA | NA |
data %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~key, scales = "free") +
geom_histogram()
We are only interested in the total number of promotions accepted by the customers, since we don’t have details about the nature of each promotion.
data$AcceptedCmpTotal <- data$AcceptedCmp1 + data$AcceptedCmp2 + data$AcceptedCmp3 + data$AcceptedCmp4 + data$AcceptedCmp5 + data$Response
The dataframe contains many variables, some are superfluous for our analysis (web visits and purchases, cumplains, catalog purchases, Z_Revenus and Z_CostContact which we don’t have information about)
data$Complain <- data$NumWebVisitsMonth <- data$NumWebPurchases <- data$NumCatalogPurchases <- data$Z_Revenue <- data$Z_CostContact <- data$AcceptedCmp1 <- data$AcceptedCmp2 <- data$AcceptedCmp3 <- data$AcceptedCmp4 <- data$AcceptedCmp5 <- data$Response <- NULL
We want to calculate the age of the customers. If we proceed with : “2021 - data$Year_Birth”, we would get their current age. It makes more sense to get their age at the moment the data was registered, so we proceed with 2014 minus dataYear_Birth, although here we are only assuming that it was indeed registered in 2014.
data$age <- 2014 - data$Year_Birth
plot(data$age)
We see 3 outliers who seems to be older than 110 years old. The corresponding birth years are 1893, 1900 and 1899. The first one could be corrected by 1993, the second one would be due to 2 typing errors which is improbable, and the third could be replaced by 1999 but it corresponds to someone who has a PhD education level, which is unlikely at age 15. Since the dataset is very big, we can choose to delete these lines.
# which(data$age>110)
data <- data[-c(193, 240, 340),]
Marital status can be simplified in only a few levels, and transformed into a factor. Since the “other” section represents less than 1% of the participants, it is not enough to model it as a factor.
We then transform some relevant variables into factors.
data$Marital_Status <- factor(data$Marital_Status, labels = c("Other", "Single", "Single", "Married", "Single", "Together", "Single", "Other"))
data$Marital_Status[data$Marital_Status=="Other"] <- NA; data$Marital_Status = droplevels(data$Marital_Status)
data$ID <- factor(data$ID)
data$Education <- factor(data$Education)
data$Teenhome <- factor(data$Teenhome)
data$AcceptedCmpTotal <- factor(data$AcceptedCmpTotal)
data$Kidhome <- factor(data$Kidhome, labels = c("no", "yes", "yes"))
For Kidhome, we fused the answers “1” and “2” because there are only 2% of “2” which is not enough information to model it a one separate factor.
We now want to plot all the variables again, and check again whether anything is abnormal.
data %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~key, scales = "free") +
geom_histogram()
Here we will be eyeballing the relationship between various variables, and determine those that seem the most relevant for further analysis.
data %>%
filter(!is.na(Marital_Status)) %>%
ggplot(aes(MntWines, Marital_Status)) + geom_boxplot(na.rm = TRUE)
ggplot(data, aes(Kidhome, MntSweetProducts))+ geom_boxplot(outlier.colour = "red") + geom_point(position = position_jitter())
ggplot(data, aes(x=Income, y=NumStorePurchases)) +
geom_violin(trim=FALSE, fill='#A4A4A4', color="darkred")+
theme_minimal()
ggplot(data, aes(x=MntWines, y=Marital_Status)) +
geom_violin(trim=FALSE, fill='#A4A4A4', color="darkred")+
geom_boxplot(width=0.05) + theme_minimal()
NumStorePurchases seems to be the most useful variable to analyse since marketing analysis might want to determine which profils buy the most in the store. In order to consider NumStorePurchases as a response variables for a linear model, we first have to check normality.
ggplot(data, aes(x = NumStorePurchases)) +
geom_histogram(aes(y = ..density..),
colour = 1, fill = "white") +
geom_density(adjust = 1) + labs(title = "Purchases",
caption = paste("skewness =", round(moments::skewness(data$NumStorePurchases, na.rm = TRUE),2)))
# put that into appendix!
# ggplot(data, aes(x = sqrt(NumStorePurchases))) +
# geom_histogram(aes(y = ..density..),
# colour = 1, fill = "white") +
# geom_density(adjust = 1) + labs(title = "Purchases",
# caption = paste("skewness =", round(moments::skewness(data$NumStorePurchases, na.rm = TRUE),2)))
m1 <- lm(data=data, NumStorePurchases ~ Kidhome*Income*Education*age)
plot(m1, c(1:2,4), ask=F)
ggplot(data, aes(NumStorePurchases, Income)) + geom_point(color="blue", alpha=0.3, position = position_jitter())
# plot(Income ~NumStorePurchases, col="lightblue", pch=19, cex=2,data)
# text(Income ~NumStorePurchases, labels=ID,data, cex=0.9, font=1)
Here we observe peculiar outliers for “Income”. One income is equal to 666’666, and when income is higher than 150’000
newdf = data %>%
filter(!ID %in% c(9432, 5555, 4619, 5336, 1501, 1503, 8475, 4931, 11181) )
# plot(Income ~NumStorePurchases, col="lightblue", pch=19, cex=2,data=newdf) # odnt need that anbynmore
m1 <- lm(data=newdf, NumStorePurchases ~ Kidhome*Income*Education*age)
plot(m1, c(1:2,4), ask=F)
Here we use the stepAIC function to select the model that has the best AIC.
ms <- MASS::stepAIC(m1, direction = "both", trace = FALSE) #il choisit le meilleur AIC
ms$anova
## Stepwise Model Path
## Analysis of Deviance Table
##
## Initial Model:
## NumStorePurchases ~ Kidhome * Income * Education * age
##
## Final Model:
## NumStorePurchases ~ Kidhome + Income + Education + age + Kidhome:Income +
## Kidhome:Education + Income:Education + Kidhome:age + Income:age +
## Education:age + Kidhome:Income:Education + Income:Education:age
##
##
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 2164 11138.08 3650.691
## 2 - Kidhome:Income:Education:age 4 26.205697 2168 11164.28 3647.870
## 3 - Kidhome:Education:age 4 14.097325 2172 11178.38 3642.652
## 4 - Kidhome:Income:age 1 8.240739 2173 11186.62 3642.276
finalm1 <- lm(data=newdf,NumStorePurchases ~ Kidhome + Income + Education + age + Kidhome:Income + Kidhome:Education + Income:Education + Kidhome:age + Income:age + Education:age + Kidhome:Income:Education + Income:Education:age)
We first use the eta_squared function to compute the effect sizes: everything that has 0.00 on the left of the 90% CI column has a “meaningless” effect size.
parameters::model_parameters(anova(finalm1))
## Parameter | Sum_Squares | df | Mean_Square | F | p
## ------------------------------------------------------------------------------
## Kidhome | 6242.08 | 1 | 6242.08 | 1212.52 | < .001
## Income | 5446.16 | 1 | 5446.16 | 1057.92 | < .001
## Education | 10.29 | 4 | 2.57 | 0.50 | 0.736
## age | 24.65 | 1 | 24.65 | 4.79 | 0.029
## Kidhome:Income | 4.87 | 1 | 4.87 | 0.95 | 0.331
## Kidhome:Education | 31.63 | 4 | 7.91 | 1.54 | 0.189
## Income:Education | 26.11 | 4 | 6.53 | 1.27 | 0.280
## Kidhome:age | 36.69 | 1 | 36.69 | 7.13 | 0.008
## Income:age | 5.97 | 1 | 5.97 | 1.16 | 0.282
## Education:age | 11.41 | 4 | 2.85 | 0.55 | 0.696
## Kidhome:Income:Education | 52.82 | 4 | 13.20 | 2.56 | 0.037
## Income:Education:age | 49.16 | 4 | 12.29 | 2.39 | 0.049
## Residuals | 11186.62 | 2173 | 5.15 | |
##
## Anova Table (Type 1 tests)
effectsize::eta_squared(car::Anova(finalm1, type = 2), ci = 0.9, alternative = "two") #modified this a bit
## # Effect Size for ANOVA (Type II)
##
## Parameter | Eta2 (partial) | 90% CI
## --------------------------------------------------------
## Kidhome | 0.05 | [0.04, 0.07]
## Income | 0.31 | [0.29, 0.34]
## Education | 4.56e-04 | [0.00, 0.00]
## age | 2.42e-03 | [0.00, 0.01]
## Kidhome:Income | 1.20e-03 | [0.00, 0.00]
## Kidhome:Education | 2.86e-03 | [0.00, 0.01]
## Income:Education | 2.28e-03 | [0.00, 0.01]
## Kidhome:age | 3.98e-03 | [0.00, 0.01]
## Income:age | 7.07e-04 | [0.00, 0.00]
## Education:age | 1.07e-03 | [0.00, 0.00]
## Kidhome:Income:Education | 5.30e-03 | [0.00, 0.01]
## Income:Education:age | 4.38e-03 | [0.00, 0.01]
sjPlot::plot_model(finalm1)
sjPlot::plot_model(finalm1, type = "pred", terms = "Kidhome", show.data = T, jitter = 1)
sjPlot::plot_model(finalm1, type = "pred", terms = "Income", show.data = T, jitter = 1)
sjPlot::plot_model(finalm1, type = "pred", terms = c("Income", "Kidhome"), show.data = T, jitter = 1)
sjPlot::plot_model(finalm1, type = "pred", terms = c("Income", "Education"), show.data = T, jitter = 1)
sjPlot::plot_model(finalm1, type = "pred", terms = c("Income", "age"), show.data = T, jitter = 1)
sjPlot::plot_model(finalm1, type = "pred", terms = c("age", "Kidhome"), show.data = T, jitter = 1)
sjPlot::tab_model(effectsize::standardize(finalm1), rm.terms = c("*Education.Q", "Education^4", "Income:Education.C", "Education.Q" , "Kidhomeyes:Education.Q", "Income:Education.Q", "Kidhomeyes:Income:Education.Q", "Education.C" , "Kidhomeyes:Education.C" , "Income:Education.C" ,"Kidhomeyes:Income:Education.C", "Education^4", "Kidhomeyes:Education^4", "Income:Education^4" , "Kidhomeyes:Income:Education^4"), show.intercept = F) #what is this for? # this to show all the STANDARDIZED estimates on a numeric form in a table summarizing all thE information
| Num Store Purchases | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| Kidhome [yes] | -0.58 | -0.83 – -0.32 | <0.001 |
| Income | 0.61 | 0.48 – 0.74 | <0.001 |
| Education [Basic] | -0.62 | -2.03 – 0.79 | 0.391 |
| Education [Graduation] | -0.07 | -0.23 – 0.08 | 0.356 |
| Education [Master] | -0.11 | -0.29 – 0.08 | 0.254 |
| Education [PhD] | 0.04 | -0.14 – 0.22 | 0.663 |
| age | 0.03 | -0.09 – 0.14 | 0.668 |
| Kidhome [yes] * Income | -0.11 | -0.35 – 0.13 | 0.383 |
|
Kidhome [yes] * Education [Basic] |
-0.76 | -3.00 – 1.47 | 0.503 |
|
Kidhome [yes] * Education [Graduation] |
0.17 | -0.10 – 0.45 | 0.214 |
|
Kidhome [yes] * Education [Master] |
0.30 | -0.01 – 0.61 | 0.061 |
|
Kidhome [yes] * Education [PhD] |
0.09 | -0.22 – 0.40 | 0.555 |
|
Income * Education [Basic] |
-0.32 | -1.28 – 0.65 | 0.522 |
|
Income * Education [Graduation] |
-0.03 | -0.18 – 0.12 | 0.684 |
|
Income * Education [Master] |
0.01 | -0.16 – 0.18 | 0.894 |
| Income * Education [PhD] | -0.17 | -0.34 – -0.00 | 0.048 |
| Kidhome [yes] * age | -0.12 | -0.20 – -0.04 | 0.003 |
| Income * age | 0.03 | -0.07 – 0.13 | 0.570 |
| Education [Basic] * age | -0.66 | -1.79 – 0.47 | 0.253 |
|
Education [Graduation] * age |
-0.03 | -0.15 – 0.09 | 0.644 |
| Education [Master] * age | 0.02 | -0.12 – 0.16 | 0.761 |
| Education [PhD] * age | -0.06 | -0.20 – 0.07 | 0.342 |
|
(Kidhome [yes] * Income) * Education [Basic] |
-0.63 | -2.07 – 0.81 | 0.392 |
|
(Kidhome [yes] * Income) * Education [Graduation] |
0.12 | -0.14 – 0.38 | 0.374 |
|
(Kidhome [yes] * Income) * Education [Master] |
0.23 | -0.08 – 0.54 | 0.153 |
|
(Kidhome [yes] * Income) * Education [PhD] |
0.44 | 0.13 – 0.76 | 0.005 |
|
(Income * Education [Basic]) * age |
-0.47 | -1.19 – 0.26 | 0.209 |
|
(Income * Education [Graduation]) * age |
-0.09 | -0.20 – 0.02 | 0.113 |
|
(Income * Education [Master]) * age |
-0.04 | -0.16 – 0.09 | 0.574 |
|
(Income * Education [PhD]) * age |
0.02 | -0.10 – 0.15 | 0.716 |
| Observations | 2204 | ||
| R2 / R2 adjusted | 0.516 / 0.510 | ||
clean_data <-data[rowSums(is.na(data))==0, ]
pm1<-prcomp(clean_data[,-c(1,3,4,6,7,8,18)], scale=TRUE)
# summary(pm1)
data2 <- cbind(clean_data, pm1$x) #double check!
# str(data2)
sjPlot::tab_pca(pm1, show.cronb = F, show.var = T, nmbr.fctr = 8)
| Component 1 | Component 2 | Component 3 | Component 4 | Component 5 | Component 6 | Component 7 | Component 8 | |
|---|---|---|---|---|---|---|---|---|
| Year_Birth | 0.00 | -1.00 | 0.02 | -0.01 | 0.05 | -0.02 | 0.06 | -0.00 |
| Income | -0.14 | 0.12 | 0.05 | -0.02 | -0.81 | 0.10 | -0.29 | 0.20 |
| Recency | 0.00 | 0.01 | -0.00 | 1.00 | -0.00 | 0.01 | -0.00 | 0.01 |
| MntWines | -0.15 | 0.09 | 0.00 | 0.02 | -0.52 | 0.18 | -0.70 | 0.01 |
| MntFruits | -0.86 | -0.01 | 0.06 | -0.01 | -0.15 | 0.11 | -0.23 | 0.12 |
| MntMeatProducts | -0.52 | -0.02 | 0.04 | 0.03 | -0.71 | 0.09 | -0.17 | 0.10 |
| MntFishProducts | -0.71 | 0.02 | 0.08 | -0.01 | -0.26 | 0.22 | -0.13 | 0.31 |
| MntSweetProducts | -0.40 | -0.01 | 0.05 | 0.02 | -0.24 | 0.12 | -0.19 | 0.83 |
| MntGoldProds | -0.23 | 0.03 | -0.04 | 0.01 | -0.15 | 0.93 | -0.18 | 0.10 |
| NumDealsPurchases | 0.09 | 0.04 | -0.99 | 0.00 | 0.05 | 0.04 | -0.05 | -0.04 |
| NumStorePurchases | -0.28 | 0.08 | -0.07 | -0.01 | -0.19 | 0.12 | -0.85 | 0.20 |
| age | -0.00 | 1.00 | -0.02 | 0.01 | -0.05 | 0.02 | -0.06 | 0.00 |
| Proportion of Variance | 36.63 % | 16.90 % | 9.14 % | 8.35 % | 6.83 % | 5.44 % | 4.09 % | 3.57 % |
| Cumulative Proportion | 36.63 % | 53.53 % | 62.67 % | 71.01 % | 77.84 % | 83.28 % | 87.36 % | 90.94 % |
| varimax-rotation | ||||||||
We need 8 principal components to attain 0,90